In this notebook I develop and explore an expected points model at the play level for evaluating college football offenses and defenses. The goal of this analysis is to place a value on offensive/defensive plays in terms of their contribution’s to a team’s expected points.
The data used is from college football games from 2003 to present. Each observation represents one play in a game, in which we know the team, the situation (down, time remaining), and the location on the field (yards to go, yards to reach end zone). We have information about the types of plays called as well in a text field.
For each play in a game, I model the probability of the next scoring event that will occur within the same half. This means the analysis is not at the drive level, but at what I dub the sequence level. Suppose a team has the ball on offense to start the first half. The next scoring event can take on one of seven outcomes:
If the team on offense drives down and scores a TD/FG, this will end the sequence. If the team on offense does not score but punts or turns the ball over, the sequence will continue with the other team now on offense. The sequence will continue until either one team scores, or the half comes to an end. From this, a sequence begins at kickoff and ends at the next kick off.
Suppose we have two teams, A and B, playing in a game. Team A receives the opening kickoff, drives for a few plays, and then punts. Team B takes over, which starts drive 2, and they drive for a few plays before also punting. Team A then manages to put together a drive that finally scores.
All plays on these three drives are one sequence. The outcome of this sequence is the points scored by Team A - if they score a touchdown, their points from this sequence is 7 (assuming for now they make the extra point). Team B’s points from this sequence is -7 points.
When Team A kicks off to Team B to start drive 4, we start our next sequence, which will end either with one team scoring or at the end of the half. We’ll then start over with a new sequence in the second half.
Why model the outcome of sequences rather than individual drives? Individual plays have the potential to affect both team’s chances of scoring, positively or negatively, and we want our model to directly capture this. If an offense turns the ball over at midfield, they are not only hurting their own chances of scoring, they are increasing the other team’s chance of scoring. The value of a play in terms of expected points is function of how both team’s probabilities are affected by the outcome.
A team’s expected points is sum of the probability of each possible scoring event multiplied by the points of that event. For this analysis, I assume that touchdowns equate to 7 rather than 6 points, assuming that extra points will be made. I can later bake in the actual probability of making extra points, but this will be a simplification for now.
For a given play \(i\) for Team \(A\), we can compute Team A’s expected points using the following:
\[ {Expected Points}_A = \\Pr(TD)*7 + \\ Pr(FG)*3 + \\Pr(Safety)*2 + \\ Pr(No Score)*0 + \\ Pr(Opp. Safety)*-2 + \\ Pr(Opp. FG) * -3 +\\ Pr(Opp. TD) * -7 \]
How do we get the probabilities of each scoring event? We learn these from historical data by using a model - I train a multinomial logistic regression model on many seasons worth of college football plays to learn how situations on the field affect the probability of the next scoring event.
The outcome for our analysis is the NEXT_SCORE_EVENT. Each play in a given sequence contributes to the eventual outcome of the sequence. Here we can see an example of one game and its drives:
SEASON | HOME | AWAY | HALF | DRIVE_NUMBER | HOME_SCORE | AWAY_SCORE | NEXT_SCORE_EVENT |
2012 | Texas A&M | Florida | First Half | 1 | 3 | 0 | HOME FG |
2012 | Texas A&M | Florida | First Half | 2 | 3 | 7 | AWAY TD |
2012 | Texas A&M | Florida | First Half | 3 | 10 | 7 | HOME TD |
2012 | Texas A&M | Florida | First Half | 4 | 10 | 7 | HOME TD |
2012 | Texas A&M | Florida | First Half | 5 | 17 | 7 | HOME TD |
2012 | Texas A&M | Florida | First Half | 6 | 17 | 10 | AWAY FG |
2012 | Texas A&M | Florida | First Half | 7 | 17 | 10 | No_Score |
2012 | Texas A&M | Florida | Second Half | 8 | 17 | 13 | AWAY FG |
2012 | Texas A&M | Florida | Second Half | 9 | 17 | 13 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 10 | 17 | 13 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 11 | 17 | 13 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 12 | 17 | 13 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 13 | 17 | 13 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 14 | 17 | 20 | AWAY TD |
2012 | Texas A&M | Florida | Second Half | 15 | 17 | 20 | No_Score |
2012 | Texas A&M | Florida | Second Half | 16 | 17 | 20 | No_Score |
2012 | Texas A&M | Florida | Second Half | 17 | 17 | 20 | No_Score |
2012 | Texas A&M | Florida | Second Half | 18 | 17 | 20 | No_Score |
2012 | Texas A&M | Florida | Second Half | 19 | 17 | 20 | No_Score |
2012 | Texas A&M | Florida | Second Half | 20 | 17 | 20 | No_Score |
For this game, we can filter to the plays that took place in the lead up to first score event. In this case, the first sequence included one drive and ended when Texas A&M kicked a field goal.
OFFENSE | DEFENSE | DRIVE | DOWN | DISTANCE | YARDS_TO_GOAL | PLAY_TEXT | NEXT_SCORE_EVENT |
Texas A&M | Florida | 1 | 1 | 10 | 75 | TEXAS A&M penalty 5 yard False Start on Patrick Lewis accepted. | FG |
Texas A&M | Florida | 1 | 1 | 15 | 80 | Christine Michael rush for no gain to the TexAM 20. | FG |
Texas A&M | Florida | 1 | 2 | 15 | 80 | Johnny Manziel pass complete to Ryan Swope for a loss of 2 yards to the TexAM 18. | FG |
Texas A&M | Florida | 1 | 3 | 17 | 82 | Johnny Manziel rush for 16 yards to the 50 yard line, FLORIDA penalty 16 yard Personal Foul accepted for a 1ST down. | FG |
Texas A&M | Florida | 1 | 1 | 10 | 50 | Johnny Manziel pass complete to Kenric McNeal for 8 yards to the Fla 42. | FG |
Texas A&M | Florida | 1 | 2 | 2 | 42 | Christine Michael rush for 2 yards to the Fla 40 for a 1ST down. | FG |
Texas A&M | Florida | 1 | 1 | 10 | 40 | Johnny Manziel pass complete to Kenric McNeal for 3 yards to the Fla 37. | FG |
Texas A&M | Florida | 1 | 2 | 7 | 37 | Christine Michael rush for 6 yards to the Fla 31. | FG |
Texas A&M | Florida | 1 | 3 | 1 | 31 | Christine Michael rush for no gain to the Fla 31. | FG |
Texas A&M | Florida | 1 | 4 | 1 | 31 | Johnny Manziel pass complete to Nehemiah Hicks for 8 yards to the Fla 23 for a 1ST down. | FG |
Texas A&M | Florida | 1 | 1 | 10 | 23 | Johnny Manziel pass complete to Ryan Swope for 2 yards to the Fla 21. | FG |
Texas A&M | Florida | 1 | 2 | 8 | 21 | Johnny Manziel pass complete to Christine Michael for 14 yards to the Fla 7 for a 1ST down. | FG |
Texas A&M | Florida | 1 | 1 | 7 | 7 | Johnny Manziel pass incomplete to Kenric McNeal, broken up by Matt Elam. | FG |
Texas A&M | Florida | 1 | 2 | 7 | 7 | Johnny Manziel pass incomplete to Mike Evans, broken up by Marcus Roberson. | FG |
Texas A&M | Florida | 1 | 3 | 7 | 7 | Johnny Manziel rush for 3 yards to the Fla 9, TEXAS A&M penalty 5 yard Illegal Forward Pass on Johnny Manziel accepted. | FG |
Texas A&M | Florida | 1 | 4 | 9 | 9 | Taylor Bertolet 27 yard field goal GOOD. | FG |
Texas A&M | Florida | 1 | -1 | -1 | 65 | Taylor Bertolet kickoff for 65 yards for a touchback. | FG |
If we look at another sequence in the second half, there were multiple drives before a team was able to score in that sequence. The next scoring event is always defined from the perspective of the offense.
OFFENSE | DEFENSE | DRIVE | DOWN | DISTANCE | YARDS_TO_GOAL | PLAY_TEXT | NEXT_SCORE_EVENT |
Texas A&M | Florida | 9 | 1 | 10 | 73 | Christine Michael rush for 6 yards to the TexAM 33. | Opp_TD |
Texas A&M | Florida | 9 | 2 | 4 | 67 | Christine Michael rush for 1 yard to the TexAM 34. | Opp_TD |
Texas A&M | Florida | 9 | 3 | 3 | 66 | Johnny Manziel pass complete to Mike Evans for 1 yard to the TexAM 35. | Opp_TD |
Texas A&M | Florida | 9 | 4 | 2 | 65 | Ryan Epperson punt for 42 yards, fair catch by Andre Debose at the Fla 23. | Opp_TD |
Florida | Texas A&M | 10 | 1 | 10 | 77 | Jeff Driskel pass complete to Frankie Hammond for 10 yards to the Fla 33 for a 1ST down. | TD |
Florida | Texas A&M | 10 | 1 | 10 | 67 | Mike Gillislee rush for 4 yards to the Fla 37. | TD |
Florida | Texas A&M | 10 | 2 | 6 | 63 | Jeff Driskel pass complete to Omarius Hines for 7 yards to the Fla 44 for a 1ST down. | TD |
Florida | Texas A&M | 10 | 1 | 10 | 56 | Mike Gillislee rush for 3 yards to the Fla 47. | TD |
Florida | Texas A&M | 10 | 2 | 7 | 53 | Jeff Driskel sacked by Spencer Nealy for a loss of 1 yard to the Fla 46. | TD |
Florida | Texas A&M | 10 | 3 | 8 | 54 | Jeff Driskel sacked by Sean Porter and Damontre Moore for a loss of 12 yards to the Fla 34. | TD |
Florida | Texas A&M | 10 | 4 | 20 | 66 | Kyle Christy punt for 48 yards, fair catch by Dustin Harris at the TexAM 18. | TD |
Texas A&M | Florida | 11 | 1 | 10 | 82 | Johnny Manziel pass complete to Mike Evans for a loss of 2 yards to the TexAM 16. | Opp_TD |
Texas A&M | Florida | 11 | 2 | 12 | 84 | Trey Williams rush for a loss of 2 yards to the TexAM 14. | Opp_TD |
Texas A&M | Florida | 11 | 3 | 14 | 86 | Johnny Manziel rush for 3 yards to the TexAM 17. | Opp_TD |
Texas A&M | Florida | 11 | 4 | 11 | 83 | Timeout FLORIDA, clock 04:18. | Opp_TD |
Texas A&M | Florida | 11 | 4 | 11 | 83 | Ryan Epperson punt for 53 yards, downed at the Fla 30. | Opp_TD |
Florida | Texas A&M | 12 | 1 | 10 | 70 | Jeff Driskel pass complete to Jordan Reed for 5 yards to the Fla 35. | TD |
Florida | Texas A&M | 12 | 2 | 5 | 65 | Mack Brown rush for no gain to the Fla 35. | TD |
Florida | Texas A&M | 12 | 3 | 5 | 65 | Jeff Driskel rush for 14 yards to the Fla 49 for a 1ST down. | TD |
Florida | Texas A&M | 12 | 1 | 10 | 51 | Matt Jones rush for 5 yards to the TexAM 46. | TD |
Florida | Texas A&M | 12 | 2 | 5 | 46 | Jeff Driskel pass incomplete. | TD |
Florida | Texas A&M | 12 | 3 | 5 | 46 | Jeff Driskel rush for no gain to the TexAM 46. | TD |
Florida | Texas A&M | 12 | 4 | 5 | 46 | Kyle Christy punt for 37 yards, fair catch by Dustin Harris at the TexAM 9. | TD |
Texas A&M | Florida | 13 | 1 | 10 | 91 | Johnny Manziel pass complete to Mike Evans for 14 yards to the TexAM 23 for a 1ST down. | Opp_TD |
Texas A&M | Florida | 13 | 1 | 10 | 77 | TEXAS A&M penalty 11 yard Personal Foul on Kenric McNeal accepted. | Opp_TD |
Texas A&M | Florida | 13 | 1 | 10 | 88 | Johnny Manziel pass complete to Thomas Johnson for 2 yards to the TexAM 14. | Opp_TD |
Texas A&M | Florida | 13 | 2 | 8 | 86 | Johnny Manziel pass complete to Mike Evans for 5 yards to the TexAM 19. | Opp_TD |
Texas A&M | Florida | 13 | 3 | 3 | 81 | Johnny Manziel sacked by Lerentee McCray for a loss of 4 yards to the TexAM 15. | Opp_TD |
Texas A&M | Florida | 13 | 4 | 7 | 85 | Ryan Epperson punt for 47 yards, fair catch by Andre Debose at the Fla 38. | Opp_TD |
Florida | Texas A&M | 14 | 1 | 10 | 62 | Mike Gillislee rush for 5 yards to the Fla 43. | TD |
Florida | Texas A&M | 14 | 2 | 5 | 57 | Jeff Driskel pass complete to Omarius Hines for 39 yards to the TexAM 18 for a 1ST down. | TD |
Florida | Texas A&M | 14 | 1 | 10 | 18 | Solomon Patton rush for 6 yards to the TexAM 12. | TD |
Florida | Texas A&M | 14 | 2 | 4 | 12 | Mike Gillislee rush for 12 yards for a TOUCHDOWN. | TD |
Florida | Texas A&M | 14 | -1 | -1 | 3 | Caleb Sturgis extra point GOOD. | TD |
Florida | Texas A&M | 14 | -1 | -1 | 65 | Caleb Sturgis kickoff for 63 yards returned by Trey Williams for 17 yards to the TexAM 19. | TD |
Our goal is to understand how individual plays contribute to a team’s expected points, or the average points teams should expect to have given their situation (down, time, possession).
For instance, in the first drive of the Texas A&M-Florida game in 2012, Texas A&M received the ball at their own 25 yard line to open the game. The simplest intuition of expected points is to ask, for teams starting at the 25 yard line at the beginning of a game, how many points do they typically go on to score? The answer is to look at all starting drives with 75 yards to go and see what the eventual next scoring event was for each of these plays - we take the average of all of the points that followed from this situation.
NEXT_SCORE_EVENT_OFFENSE | n | prop |
TD | 2,234 | 0.382 |
Opp_TD | 2,021 | 0.346 |
FG | 785 | 0.134 |
Opp_FG | 721 | 0.123 |
No_Score | 41 | 0.007 |
Opp_Safety | 24 | 0.004 |
Safety | 20 | 0.003 |
YARDS_TO_GOAL | DRIVE_NUMBER | expected_points | n |
75 | 1 | 0.33 | 5,898 |
In this case, this means teams with the ball at their own 25 to start the game generally obtained more points on the ensuing sequence than their opponents, so they have a slightly positive expected points.
But, this is also a function of the down. If we look at the expected points for a team in this situation in first down vs a team in this situation for fourth down, we should see a drop in their expected points - by the time you hit fourth down, if you haven’t moved from the 25, your expected points drops into the negatives, as you will now be punting the ball back to your opponent and it becomes more probable that they score than you.
YARDS_TO_GOAL | NEXT_SCORE_EVENT_OFFENSE | DOWN_1 | DOWN_2 | DOWN_3 | DOWN_4 |
75 | Opp_TD | 1,401 | 374 | 157 | 95 |
75 | TD | 1,756 | 350 | 129 | 44 |
75 | Opp_FG | 491 | 135 | 57 | 38 |
75 | FG | 602 | 122 | 45 | 17 |
75 | No_Score | 23 | 8 | 4 | 6 |
75 | Opp_Safety | 13 | 6 | 3 | 2 |
75 | Safety | 12 | 3 | 4 | 1 |
YARDS_TO_GOAL | DOWN | expected_points | n |
75 | 1 | 0.66 | 4,299 |
75 | 2 | -0.21 | 999 |
75 | 3 | -0.58 | 399 |
75 | 4 | -2.08 | 203 |
The fact that the expected point changes based on the down and yard line allows us to look at the difference between expected points from play to play - the difference in expected points based on how the situation changed allows us to compute the Expected Points Added from a single play.
For any given play, we get a sense of the expected points a team can expect from their situation. For instance, if we look at all total plays in a game, how do expected points vary as a function of a team’s distance from their opponent’s goal line?
This should make sense - if you’re backed up against your own end zone, your opponent has higher expected points because they are, historically, more likely to have the next scoring event, either by gaining good field advantage after you punt or by getting a safety. We can see this if we just look at the proportion of next scoring events based on the offense’s position on the field.
From this, when we see an offense move the ball up the field on a given play, we will generally see their expected points go up. The difference in expected points before the snap and after the snap is the value added (positively or negatively) by the play.
But, it’s not just position on the field - it’s also about the situation. If we look at how expected points varies by the down, we should see that fourth downs have lower expected points.
We also have other features like distance to convert the first down (filtering here to plays with a maximum of 30 yards to go, as we start to run out of data at higher values and it looks wonky).
And we also have info on time remaining in the half - as we might expect, the proportion of drives leading to no scoring goes up as the amount of time remaining in the half goes down.
We use all of this historical data to learn the expected points from a given situation, then look at the difference in expected points from play to play - this is the intuition behind how we will value individual plays, which we can then roll up to the offense/defense/game/season level.
How do these various features like down, distance, yards to goal, and time remaining affect the probability of the next scoring event? We use a model to learn this relationship from historical plays. I’ll now proceed to building the model which I’ll use for the bulk of the analysis.
I’ll set up training, validation, and test sets based around the season. I’m mostly going to build the model using plays from the 2010 season onwards, as the data quality of the play by play data starts to get worse the further back we go, though I’ll do some backtesting of the model on older seasons.
I’m going to use the seasons of 2010-2018 as my main training set, building and evaluating the model using a leave-one-season out approach, akin to k-fold cross validation using seasons as the folds. I’ll use the 2019-2020 seasons as a validation set, and leave 2021 as my test set which I won’t look at till later on.
# full plays
plays_full = plays_data_score_events %>%
filter(PLAY_TYPE != 'Kickoff') %>%
select(GAME_ID,
DRIVE_ID,
PLAY_ID,
SEASON,
HOME,
AWAY,
OFFENSE,
DEFENSE,
OFFENSE_SCORE,
DEFENSE_SCORE,
SCORING,
PLAY_TEXT,
PLAY_TYPE,
NEXT_SCORE_EVENT_HOME,
NEXT_SCORE_EVENT_HOME_DIFF,
NEXT_SCORE_EVENT_OFFENSE,
NEXT_SCORE_EVENT_OFFENSE_DIFF,
YARD_LINE,
HALF,
PERIOD,
MINUTES_IN_HALF,
SECONDS_IN_HALF,
DOWN,
DISTANCE,
YARD_LINE,
YARDS_TO_GOAL) %>%
filter(DOWN %in% c(1, 2, 3, 4)) %>%
filter(PERIOD %in% c(1,2,3,4)) %>%
filter(!is.na(SECONDS_IN_HALF)) %>%
filter(DISTANCE >=0 & DISTANCE <=100) %>%
filter(!is.na(NEXT_SCORE_EVENT_OFFENSE)) %>%
mutate(NEXT_SCORE_EVENT_OFFENSE = factor(NEXT_SCORE_EVENT_OFFENSE,
levels = c("No_Score",
"TD",
"FG",
"Safety",
"Opp_Safety",
"Opp_FG",
"Opp_TD"))) %>%
arrange(SEASON, GAME_ID, PLAY_ID)
# training set
plays_train = plays_full %>%
filter(SEASON >= 2010 & SEASON <2019)
# validation set
plays_valid = plays_full %>%
filter(SEASON >= 2019 & SEASON <= 2020)
# test
plays_test = plays_full %>%
filter(SEASON > 2020)
# make an initial split based on previously defined splits
valid_split = make_splits(list(analysis = seq(nrow(plays_train)),
assessment = nrow(plays_train) + seq(nrow(plays_valid))),
bind_rows(plays_train,
plays_valid))
# test split
test_split = make_splits(
list(analysis = seq(nrow(plays_train) + nrow(plays_valid)),
assessment = nrow(plays_train) + nrow(plays_valid) + seq(nrow(plays_test))),
bind_rows(plays_train,
plays_valid,
plays_test))
The outcome is the next scoring event, always defined from the perspective of the offense for any given play.
I currently use the following as features for plays in a baseline model:
I also include interactions between down and distance, down and yards to end zone, and yards to end zone and seconds remaining. This baseline model doesn’t account for things like offense/defense quality or scoring effects, I’ll train other models later on with these features to get things like ‘Defense Adjusted Expected Points’. But this analysis is focused first and foremost on estimating probabilities
I’ll now set up a recipe for the baseline model.
baseline_recipe = recipe(NEXT_SCORE_EVENT_OFFENSE ~.,
data = plays_train) %>%
update_role(all_predictors(),
new_role = "ID") %>%
update_role(
c("GAME_ID",
"DRIVE_ID",
"PLAY_ID",
"SEASON",
"HOME",
"AWAY",
"OFFENSE",
"DEFENSE",
"SCORING",
"OFFENSE_SCORE",
"DEFENSE_SCORE",
"PLAY_TEXT",
"PLAY_TYPE",
"NEXT_SCORE_EVENT_HOME",
"NEXT_SCORE_EVENT_HOME_DIFF",
"NEXT_SCORE_EVENT_OFFENSE_DIFF",
"YARD_LINE",
"MINUTES_IN_HALF",
"HALF"),
new_role = "ID") %>%
step_mutate(PERIOD_ID = PERIOD,
role = "ID") %>%
# features we're inheriting
update_role(
c("PERIOD",
"SECONDS_IN_HALF",
"DOWN",
"DISTANCE",
"YARDS_TO_GOAL"),
new_role = "predictor") %>%
# filters for issues
step_filter(!is.na(NEXT_SCORE_EVENT_OFFENSE)) %>%
step_filter(YARD_LINE <= 100 & YARD_LINE >=0) %>%
step_filter(YARDS_TO_GOAL <=100 & YARD_LINE >=0) %>%
step_filter(DOWN %in% c(1, 2, 3, 4)) %>%
step_filter(DISTANCE >=0 & DISTANCE <=100) %>%
step_filter(SECONDS_IN_HALF <=1800) %>%
step_filter(!is.na(SECONDS_IN_HALF)) %>%
step_filter(PERIOD_ID == 1 | PERIOD_ID == 2 | PERIOD_ID == 3 | PERIOD_ID == 4) %>%
# create features
step_mutate(KICKOFF = case_when(grepl("kickoff", tolower(PLAY_TEXT)) | grepl("kickoff", tolower(PLAY_TYPE))==T ~ 1,
TRUE ~ 0)) %>%
step_mutate(TIMEOUT = case_when(grepl("timeout", tolower(PLAY_TEXT)) ~ 1,
TRUE ~ 0)) %>%
step_filter(TIMEOUT != 1) %>%
step_filter(KICKOFF != 1) %>%
step_mutate(DOWN_TO_GOAL = case_when(DISTANCE == YARDS_TO_GOAL ~ 1,
TRUE ~ 0)) %>%
step_mutate(DOWN = factor(DOWN)) %>%
step_mutate(PERIOD = factor(PERIOD)) %>%
step_log(DISTANCE, offset =1) %>%
step_dummy(all_nominal_predictors()) %>%
step_novel(all_nominal_predictors(),
new_level = "new") %>%
step_interact(terms = ~ DISTANCE:(starts_with("DOWN_"))) %>%
step_interact(terms = ~ YARDS_TO_GOAL:(starts_with("DOWN_"))) %>%
step_interact(terms = ~ YARDS_TO_GOAL*SECONDS_IN_HALF) %>%
check_missing(all_predictors()) %>%
step_normalize(all_numeric_predictors())
I’ll define the model I’ll be using here, which is a multinomial logistic regression.
# from glmnet
multinom_mod = multinom_reg(
mode = "classification",
engine = "glmnet",
penalty = 0,
mixture = NULL
)
I’ll then create a workflow.
# create multinomial workflow
multinom_wf = workflow() %>%
add_recipe(baseline_recipe) %>%
add_model(multinom_mod)
# workflow settings
# metrics
class_metrics<-metric_set(yardstick::roc_auc,
yardstick::mn_log_loss)
# control for resamples
keep_pred <- control_resamples(save_pred = TRUE,
save_workflow = TRUE,
allow_par=T)
I’ll manually define resamples based on the seasons - rather than doing k-fold cross validation, I’ll assign each season to be a fold and train and assess the model leaving one season out at a time.
## Loading required package: iterators
## Loading required package: parallel
Now I’ll train and assess the model on the training set via leave-one-season-out cross validation, then I’ll refit the model to the entire training set.
# fit to resamples
resamples_multinom = multinom_wf %>%
fit_resamples(data = plays_train,
metrics = class_metrics,
resamples = manual_resamples,
control = keep_pred,
verbose=T)
## ℹ The workflow being saved contains a recipe, which is 362.9 Mb in
## ℹ memory. If this was not intentional, please set the control setting
## ℹ `save_workflow = FALSE`.
# # # save locally so as to not need to retrain everytime
# write_rds(resamples_multinom,
# file = here::here("models", "resamples_expected_points.Rds"),
# compress = "gz")
# fit the model to the whole training set
fit_multinom = multinom_wf %>%
fit(data = plays_train)
How did the model perform on each resample? I’m looking at the logloss and area under the receiver operating characteristic curve (roc_auc). The logloss is only meaningful by comparison, I’ll compare how the model does to a null model on the validation set.
id | .metric | .estimator | .estimate |
holdout: 2010 | mn_log_loss | multiclass | 1.241 |
holdout: 2011 | mn_log_loss | multiclass | 1.235 |
holdout: 2012 | mn_log_loss | multiclass | 1.236 |
holdout: 2013 | mn_log_loss | multiclass | 1.242 |
holdout: 2014 | mn_log_loss | multiclass | 1.237 |
holdout: 2015 | mn_log_loss | multiclass | 1.244 |
holdout: 2016 | mn_log_loss | multiclass | 1.227 |
holdout: 2017 | mn_log_loss | multiclass | 1.236 |
holdout: 2018 | mn_log_loss | multiclass | 1.226 |
holdout: 2010 | roc_auc | hand_till | 0.713 |
holdout: 2011 | roc_auc | hand_till | 0.704 |
holdout: 2012 | roc_auc | hand_till | 0.703 |
holdout: 2013 | roc_auc | hand_till | 0.708 |
holdout: 2014 | roc_auc | hand_till | 0.694 |
holdout: 2015 | roc_auc | hand_till | 0.708 |
holdout: 2016 | roc_auc | hand_till | 0.707 |
holdout: 2017 | roc_auc | hand_till | 0.703 |
holdout: 2018 | roc_auc | hand_till | 0.715 |
Understanding partial effects from a multinomial logit is already difficult, and I’ve thrown a bunch of interactions in there to make this even more difficult.
I’ll look at predicted probabilities using an observed values approach for particular features (using a sample rather than the full dataset to save time). This means taking the model and then altering the feature of interest for every observation and taking the average predicted probability for each outcome across all observations.
How is the probability of the next scoring event influenced by where the offense has possession?
How is this affected by the down?
How does this translate into expected points?
We can evaluate the model via our leave-one-out approach, but we’ll also predict the validation set as an additional check. I’ll compare performance relative to a null model that simply predicts the incidence rate of each outcome in the training set.
method | .metric | .estimator | .estimate |
multinom | roc_auc | hand_till | 0.703 |
null | roc_auc | hand_till | 0.500 |
method | .metric | .estimator | .estimate |
multinom | mn_log_loss | multiclass | 1.227 |
null | mn_log_loss | multiclass | 1.475 |
What’s the log loss for each outcome?
NEXT_SCORE_EVENT_OFFENSE | .metric | .estimator | multinom | null |
No_Score | mn_log_loss | multiclass | 1.035 | 1.886 |
TD | mn_log_loss | multiclass | 0.816 | 0.902 |
FG | mn_log_loss | multiclass | 1.660 | 1.806 |
Safety | mn_log_loss | multiclass | 5.704 | 5.753 |
Opp_Safety | mn_log_loss | multiclass | 5.619 | 5.956 |
Opp_FG | mn_log_loss | multiclass | 2.522 | 2.751 |
Opp_TD | mn_log_loss | multiclass | 1.344 | 1.568 |
I’ll now start diving into the predictions for individual plays as a means to evaluate plays and teams.
It’s worth noting that we might see some season-level differences that make comparison across seasons difficult, since the predictions are all coming from slightly different models due to resampling.
I’ll get Expected Points Added for all non scoring plays. This part can be a little wobbly, due to data quality issues with defining sequences. The basic thought here is to say, at the start of a play, we know the expected points for a team in that situation, EP_Pre. We then look to the next play to see the expected points for the team after the result of the previous play, EP_Post. EP_Added is the difference between these two outcomes from the perspective of the offense.
This means that if the ball is turned over, but not scored, the team on offense becomes the defense and the sign of the expected points on the next play flips for their calculation. For events that produce touchdowns, EP_Added is empty, but I create another feature simply called Points_Added in which I take the difference between EP_Pre and the points scored on the play, 7 for touchdowns, 3 for FGs, 2 for safeties.
I’ll look at a few games, play by play, to get a sense of how this is looking. I’ll pick one game completely at random, in no way influenced by my fandom: Texas A&M vs Alabama in 2012.
What were the most impactful plays in this game in terms of expected points added? I’ll look at the 15 plays in this game with the largest absolute change.
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | EP_Post | EP_Added |
2012 | 4 | Alabama | Texas A&M | 1 | 10 | 38 | T.J. Yeldon rush for 8 yards to the TexAM 30, fumbled, forced by Steven Terrell, recovered by TexAM Dustin Harris, Dustin Harris for 4 yards to the TexAM 34. | 3.46 | -1.50 | -4.96 |
2012 | 1 | Alabama | Texas A&M | 3 | 5 | 52 | AJ McCarron pass intercepted by Sean Porter at the TexAM 43, returned for 16 yards to the Alab 41. | 1.26 | -3.38 | -4.64 |
2012 | 4 | Texas A&M | Alabama | 3 | 8 | 68 | Johnny Manziel pass complete to Ryan Swope for 28 yards to the Alab 25, ALABAMA penalty 15 yard Personal Foul on Ha'Sean Clinton-Dix accepted for a 1ST down. | -0.14 | 4.36 | 4.50 |
2012 | 4 | Alabama | Texas A&M | 1 | 18 | 88 | AJ McCarron pass complete to Amari Cooper for 50 yards to the TexAM 38 for a 1ST down. | -0.52 | 3.46 | 3.97 |
2012 | 4 | Alabama | Texas A&M | 1 | 10 | 60 | AJ McCarron pass complete to Kenny Bell for 54 yards to the TexAM 6 for a 1ST down. | 1.20 | 4.99 | 3.79 |
2012 | 1 | Texas A&M | Alabama | 3 | 6 | 59 | Johnny Manziel rush for 32 yards to the Alab 27 for a 1ST down. | 0.75 | 4.27 | 3.52 |
2012 | 2 | Alabama | Texas A&M | 4 | 4 | 34 | AJ McCarron pass complete to Eddie Lacy for 4 yards to the TexAM 30 for a 1ST down. | 1.26 | 4.10 | 2.83 |
2012 | 3 | Alabama | Texas A&M | 3 | 10 | 40 | AJ McCarron pass complete to Eddie Lacy for 21 yards to the TexAM 19 for a 1ST down. | 1.73 | 4.53 | 2.79 |
2012 | 4 | Alabama | Texas A&M | 4 | 2 | 2 | AJ McCarron pass intercepted by Deshazor Everett at the TexAM 0, returned for 4 yards to the TexAM 4. | 2.85 | 0.15 | -2.70 |
2012 | 4 | Texas A&M | Alabama | 1 | 10 | 66 | Johnny Manziel pass complete to Ryan Swope for 42 yards to the Alab 24 for a 1ST down. | 1.50 | 4.13 | 2.63 |
2012 | 1 | Texas A&M | Alabama | 3 | 5 | 10 | Johnny Manziel pass complete to Thomas Johnson for 8 yards to the Alab 2 for a 1ST down. | 3.82 | 6.29 | 2.47 |
2012 | 2 | Texas A&M | Alabama | 4 | 6 | 37 | Johnny Manziel rush for 5 yards to the Alab 32. | 1.00 | -1.22 | -2.22 |
2012 | 2 | Texas A&M | Alabama | 3 | 4 | 62 | Johnny Manziel pass complete to Ryan Swope for 10 yards to the TexAM 48 for a 1ST down. | 0.78 | 2.65 | 1.87 |
2012 | 1 | Texas A&M | Alabama | 2 | 7 | 43 | Johnny Manziel rush for 29 yards to the Alab 14 for a 1ST down. | 2.62 | 4.49 | 1.87 |
2012 | 3 | Alabama | Texas A&M | 3 | 5 | 60 | AJ McCarron pass incomplete, broken up by Julien Obioha. | 0.43 | -1.43 | -1.86 |
Kind of interesting - this game is remembered for a lot of plays by Johnny Manziel, but the most impactful plays in the game in terms of expected points changes were actually turnovers forced by the A&M defense.
However, this doesn’t include scoring plays. The Points_Added measure looks at the difference between actual points scored vs expected points from the situation.
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | Points_Added |
2012 | 4 | Alabama | Texas A&M | 3 | 6 | 54 | AJ McCarron pass complete to Amari Cooper for 54 yards for a TOUCHDOWN. | 0.92 | 6.08 |
2012 | 1 | Texas A&M | Alabama | 3 | 10 | 10 | Johnny Manziel pass complete to Ryan Swope for 10 yards for a TOUCHDOWN. | 3.69 | 3.31 |
2012 | 4 | Texas A&M | Alabama | 1 | 10 | 24 | Johnny Manziel pass complete to Malcome Kennedy for 24 yards for a TOUCHDOWN. | 4.13 | 2.87 |
2012 | 2 | Alabama | Texas A&M | 3 | 2 | 2 | Eddie Lacy rush for 2 yards for a TOUCHDOWN. | 4.86 | 2.14 |
2012 | 1 | Texas A&M | Alabama | 3 | 1 | 1 | Christine Michael rush for 1 yard for a TOUCHDOWN. | 5.87 | 1.13 |
In this case, we can still use the expected points of the situation to give us a measure of how many points the play added. In the case of Christine Michael’s 1 yard TD run, the points added are pretty low, as you’d expect teams to score from that situation. By comparison, AJ McCarron’s 54 yard pass on 3rd and 6 in the 4th quarter had a much higher points added, as they were in a third down situation at the midfield and made a huge play. For this reason, what I’m calling Points Added has been dubbed by others as a measure of a team’s offensive explosiveness. For most of my analyses I’m condensing Expected Points Added and points Added into one metric that I’m calling Predicted Points Added.
What’s cool is that we can do this (right now) for any play from any game from 2010-2020. For instance, we can look at a team and ask, what were its highest predicted points added plays in each season? This is Wisconsin’s top plays for each year.
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | EP_Post | EPA | PA | PPA |
2010 | 4 | Wisconsin | Indiana | 3 | 6 | 74 | Jon Budmayr pass complete to Jared Abbrederis for 74 yards for a TOUCHDOWN. | -0.21 | 7.21 | 7.21 | ||
2011 | 4 | Wisconsin | Penn State | 4 | 17 | 77 | Brad Nortman punt for 33 yards, returned by Drew Astorino for no gain, fumbled at the PnSt 44. | -2.72 | 3.47 | 6.19 | 6.19 | |
2012 | 2 | Wisconsin | Indiana | 3 | 16 | 69 | James White rush for 69 yards for a TOUCHDOWN. | -0.22 | 7.22 | 7.22 | ||
2013 | 1 | Wisconsin | Indiana | 1 | 10 | 93 | James White rush for 93 yards for a TOUCHDOWN. | -0.78 | 7.78 | 7.78 | ||
2014 | 3 | Wisconsin | Iowa | 1 | 10 | 92 | Melvin Gordon run for 88 yds to the Iowa 4 for a 1ST down | -0.67 | 5.87 | 6.53 | 6.53 | |
2015 | 1 | Wisconsin | Maryland | 4 | 1 | 78 | Joe Schobert run for 57 yds to the Mary 21 for a 1ST down | -1.44 | 4.50 | 5.94 | 5.94 | |
2016 | 1 | Wisconsin | Penn State | 2 | 5 | 67 | Corey Clement run for 67 yds for a TD, (Andrew Endicott KICK) | 1.36 | 5.64 | 5.64 | ||
2017 | 2 | Wisconsin | Nebraska | 1 | 10 | 75 | Jonathan Taylor run for 75 yds for a TD, (Rafael Gaglianone KICK) | 0.45 | 6.55 | 6.55 | ||
2018 | 4 | Wisconsin | Nebraska | 1 | 10 | 88 | Jonathan Taylor run for 88 yds for a TD (Rafael Gaglianone KICK) | -0.24 | 7.24 | 7.24 | ||
2019 | 4 | Wisconsin | Minnesota | 3 | 6 | 81 | Jack Coan pass complete to Garrett Groshek for 70 yds to the Minn 11 for a 1ST down | -0.94 | 4.89 | 5.83 | 5.83 | |
2020 | 2 | Wisconsin | Illinois | 1 | 10 | 53 | Danny Davis 53 Yd pass from Graham Mertz (Collin Larsh Kick) | 1.55 | 5.45 | 5.45 |
What were the highest predicted points added plays (including scoring plays) from each season?
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | EP_Post | EPA | PA | PPA |
2010 | 1 | Ohio | Wofford | 4 | 10 | 78 | Paul Hershey punt for 40 yards, returned by Brenton Bersin, fumbled, forced by Jeremy LaVoie, recovered by Ohio Julian Posey at the Woffd 38, Julian Posey for 38 yards, to the Woffd 0 for a TOUCHDOWN. | -2.66 | 9.66 | 9.66 | ||
2011 | 1 | California | Washington | 3 | 20 | 90 | Zach Maynard pass complete to Keenan Allen for 90 yards for a TOUCHDOWN. | -2.94 | 9.94 | 9.94 | ||
2012 | 3 | Fresno State | New Mexico | 3 | 15 | 89 | Derek Carr pass complete to Davante Adams for 89 yards for a TOUCHDOWN. | -2.66 | 9.66 | 9.66 | ||
2013 | 3 | Clemson | Virginia | 3 | 15 | 96 | Tajh Boyd pass complete to Sammy Watkins for 96 yards for a TOUCHDOWN. | -3.17 | 10.17 | 10.17 | ||
2014 | 3 | Iowa | Northern Iowa | 3 | 5 | 94 | Jake Rudock pass complete to Tevaun Smith for 6 yds for a TD, (Marshall Koehn KICK) | -2.25 | 9.25 | 9.25 | ||
2015 | 1 | BYU | Boise State | 3 | 19 | 84 | Tanner Mangum pass complete to Mitchell Juergens for 84 yds for a TD, (Trevor Samson KICK) | -2.63 | 9.63 | 9.63 | ||
2016 | 3 | UNLV | Fresno State | 3 | 11 | 91 | Dalton Sneed run for 91 yds for a TD, (Evan Pantels KICK) | -2.56 | 9.56 | 9.56 | ||
2017 | 1 | San Diego State | Arizona State | 3 | 6 | 95 | Rashaad Penny run for 95 yds for a TD, (John Baron II KICK) | -2.39 | 9.39 | 9.39 | ||
2018 | 3 | Colorado | Colorado State | 3 | 14 | 89 | Steven Montez pass complete to Laviska Shenault Jr. for 89 yds for a TD (James Stefanou KICK) | -2.72 | 9.72 | 9.72 | ||
2019 | 1 | Tennessee State | Middle Tennessee | 3 | 26 | 96 | Cameron Rosendahl pass complete to Chris Rowland for 96 yds for a TD, (Antonio Zita PAT BLOCKED) | -3.43 | 10.43 | 10.43 | ||
2020 | 2 | SMU | Memphis | 3 | 15 | 85 | Shane Buechele pass complete to Reggie Roberson Jr. for 85 yds for a TD (Chris Naggar KICK) | -2.21 | 9.21 | 9.21 |
Looks like there’s a data quality issue with the Iowa vs Northern Iowa game, the yards to go looks to be incorrect there which is making that play look like it was a big yardage gain. We also notice some data quality issues if we look at highest expected points added plays, not including scoring plays.
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | EP_Post | EPA | PA | PPA |
2010 | 4 | Bowling Green | Miami (OH) | 4 | 10 | 76 | Matt Schilz pass complete to Willie Geter for a loss of 1 yard to the BwGrn 23. | -2.30 | 6.57 | 8.88 | 8.88 | |
2011 | 3 | Troy | Clemson | 4 | 58 | 93 | Brynden Trawick pass complete to Michael Taylor for a loss of 7 yards to the Troy 0. | -3.73 | 5.70 | 9.43 | 9.43 | |
2012 | 3 | South Florida | Florida State | 2 | 100 | 98 | B.J. Daniels rush for 1 yard to the FlaSt 1. | -3.71 | 5.76 | 9.47 | 9.47 | |
2013 | 1 | Wyoming | Fresno State | 3 | 7 | 80 | Tedder Easton rush for 79 yards to the FrsSt 1 for a 1ST down. | -1.38 | 6.48 | 7.86 | 7.86 | |
2014 | 3 | Fresno State | San Diego State | 4 | 4 | 100 | BURRELL, Brian pass incomplete to HARPER, Josh, PENALTY SDSU pass interference (SMITH, Malik) 10 yards to the SDSU13, 1ST DOWN FS, NO PLAY, PENALTY SDSU unsportsmanlike conduct (SMITH, Malik) 9 yards to the SDSU4, 1ST DOWN FS, NO PLAY. for a 1ST down | -3.27 | 5.86 | 9.13 | 9.13 | |
2015 | 2 | San José State | San Diego State | 3 | 2 | 100 | Tyler Ervin run for 2 yds to the SDSt 8 for a 1ST down SAN DIEGO ST Penalty, personal foul (JJ Whittaker) to the SDSt 2 for a 1ST down | -2.19 | 6.26 | 8.45 | 8.45 | |
2016 | 4 | Illinois | Purdue | 3 | 4 | 100 | Chayce Crouch pass complete to Tyler White for 7 yds to the Prdue 13 for a 1ST down PURDUE Penalty, roughing passer (Eddy Wilson) to the Prdue 3 for a 1ST down | -2.41 | 6.20 | 8.61 | 8.61 | |
2017 | 1 | Texas A&M | Louisiana | 3 | 36 | 77 | Kellen Mond pass complete to Damion Ratley for 76 yds to the LaLaf 1 for a 1ST down | -2.29 | 6.50 | 8.79 | 8.79 | |
2018 | 3 | Mississippi State | Texas A&M | 3 | 21 | 86 | Nick Fitzgerald pass complete to Stephen Guidry for 84 yds to the TexAM 2 for a 1ST down | -2.30 | 6.39 | 8.69 | 8.69 | |
2019 | 3 | Georgia Southern | Maine | 3 | 23 | 73 | Logan Wright run for 70 yds to the Maine 3 for a 1ST down | -2.05 | 5.95 | 8.00 | 8.00 | |
2020 | 3 | Fresno State | Hawai'i | 3 | 6 | 100 | HAENER, Jake pass incomplete to KELLY, Josh (KANESHIRO, Kai), PENALTY UH personal foul (KANESHIRO, Kai) 13 yards to the UH12, 1ST DOWN FS, NO PLAY, PENALTY UH unsportsmanlike conduct (KANESHIRO, Kai) 6 yards to the UH6, 1ST DOWN FS, NO PLAY. | -2.71 | 5.55 | 8.26 | 8.26 |
For that 2011 play, it’s 4th and 58 with 93 yards to go and the pass is complete for a loss of seven yards, and yet the model considers this a play worthy of 9 expected points? The heck? Turns out this, and most of these, are actually just issues of data quality with ESPNs play by play data.
SEASON | PERIOD | OFFENSE | DEFENSE | DOWN | DISTANCE | YTG | PLAY | EP_Pre | EP_Post | EPA | PA | PPA |
2011 | 3 | Troy | Clemson | 4 | 11 | 52 | Will Goggans punt for 52 yards for a touchback. | -0.93 | -0.52 | 0.40 | 0.40 | |
2011 | 3 | Troy | Clemson | 4 | 58 | 93 | Brynden Trawick pass complete to Michael Taylor for a loss of 7 yards to the Troy 0. | -3.73 | 5.70 | 9.43 | 9.43 | |
2011 | 3 | Troy | Clemson | 1 | 10 | 83 | Corey Robinson pass complete to Shawn Southward for 3 yards to the Troy 10, tackled by Bashaud Breeland, TROY penalty 10 yard holding accepted. | 0.32 | -0.68 | -1.00 | -1.00 |
In this case, the play by play data has a bizarre sequence in which Troy is listed as punting, then completing a loss on 4th and 58, while still having possession on the next play with a 1st and 10. Since the expected points added calculation is based on the difference between plays, that phantom 4th and 58 play throws off the value of the next play. This is unfortunately quite common in ESPNs play by play data, especially as we go further back.
Having scored all individual plays, we can now roll this up to whatever level of analysis we’re interested in. We can, for instance, look at a team’s offense for each season over this time period. Continuing to select a team at random, we’ll look at Texas A&M’s offense in terms of predicted/expected points game by game over this time period. The line in navy, EPA_Average, looks at a team’s expected points added per play without including scoring plays. The line in light blue, PPA_Average, looks at all plays, including scoring.
If we use these as general measures of the efficiency of an offense in a season/game, where does A&M place compared to all teams? What is good? I’ll plot the distribution of all teams with at least 400 plays in a season, then overlay where A&M ranks.
Based on either metric, we can see that A&M’s offense in 2012 and 2013 were among the best in college football at the time. It also looks like A&M’s 2018 and 2020 offense were pretty darn good as well.
Let’s look at another team that we would expect to be really strong this entire time period: Ohio State.
What about a team that’s been up and down during this time period? Some lowly team, like Texas.
What about Florida State? We would expect them to be towards the top in the Jimob/Jameis era and then fall back to Earth during the Taggart era.
Hmm. That’s kind of interesting - I’ve heard people discuss Jimbo’s offense as lacking in explosiveness, which some argue is the difference between the expected points and the predicted points. This definitely wasn’t the case in 2013, but generally FSU had a pretty efficient and explosive offense until their collapse in 2017.
But this is only looking at one side of the ball at the season level. We can look at team’s defense in the same way in terms of the average expected/predicted points allowed by the offenses they face. In this case, you want your defense to have a negative value - this indicates that offenses weren’t able to generate points against your defense. I’ll look at a team’s offense and defense side by side in terms of Predicted Points Added - meaning I am including scoring plays. We would expect a good team to be one whose offense (in blue) generates more points per play than they allow on defense (in red).
I’ll look at a team like Wisconsin. They seem to generally have a pretty good defense, but struggle in terms of their ability to generate points on offense.
For a dominant team like Alabama, we should see their offensive efficiency be high and their defensive efficiency be high as well - the blue should almost always outstrip the red.
Evidently they were so good at stopping Tennesee in 2011 that it breaks my axis.
For a mediocre team, like Kansas, we should see the opposite type of graph here.
We can then aggregate to a team’s offensive performance within a full year to rate their overall offensive/defensive efficiency. We can also break this down by passing vs rushing plays on both sides of the ball.
For the purpose of evaluating a team’s defense in this analysis, I’ll flip the sign of a defense’s points, which are currently scored from the perspective of the offense, so that positive is always good for a team.
Putting this all together, I can rank a teams offense/defense by year and then sort to see where the top teams of all time tend to rank. Here are the top 50 teams using a composite score of both, including only offenses and defenses with at least 400 plays in a season.
Rank | SEASON | TEAM | OFFENSE_PPA | DEFENSE_PPA | OVERALL |
1 | 2013 | Florida State | 0.35 | 0.19 | 0.54 |
2 | 2019 | Clemson | 0.32 | 0.18 | 0.50 |
3 | 2010 | Boise State | 0.28 | 0.21 | 0.49 |
4 | 2019 | Ohio State | 0.32 | 0.18 | 0.49 |
5 | 2011 | Alabama | 0.16 | 0.30 | 0.46 |
6 | 2010 | TCU | 0.23 | 0.22 | 0.45 |
7 | 2018 | Alabama | 0.34 | 0.10 | 0.44 |
8 | 2011 | Wisconsin | 0.32 | 0.10 | 0.42 |
9 | 2019 | Alabama | 0.32 | 0.10 | 0.42 |
10 | 2020 | BYU | 0.36 | 0.06 | 0.42 |
11 | 2013 | Alabama | 0.27 | 0.14 | 0.41 |
12 | 2020 | Alabama | 0.38 | 0.04 | 0.41 |
13 | 2010 | Ohio State | 0.17 | 0.23 | 0.40 |
14 | 2011 | LSU | 0.18 | 0.22 | 0.40 |
15 | 2012 | Alabama | 0.24 | 0.16 | 0.40 |
16 | 2016 | Alabama | 0.17 | 0.24 | 0.40 |
17 | 2017 | Alabama | 0.27 | 0.13 | 0.40 |
18 | 2018 | Clemson | 0.25 | 0.15 | 0.40 |
19 | 2020 | Clemson | 0.21 | 0.17 | 0.39 |
20 | 2013 | Baylor | 0.25 | 0.13 | 0.38 |
21 | 2016 | Michigan | 0.18 | 0.20 | 0.38 |
22 | 2016 | Ohio State | 0.19 | 0.18 | 0.37 |
23 | 2016 | Washington | 0.28 | 0.08 | 0.37 |
24 | 2013 | Louisville | 0.22 | 0.14 | 0.36 |
25 | 2019 | LSU | 0.34 | 0.02 | 0.36 |
26 | 2010 | Oregon | 0.21 | 0.13 | 0.34 |
27 | 2010 | Wisconsin | 0.27 | 0.07 | 0.34 |
28 | 2017 | Washington | 0.21 | 0.13 | 0.34 |
29 | 2011 | Houston | 0.27 | 0.06 | 0.33 |
30 | 2012 | Florida State | 0.18 | 0.15 | 0.33 |
31 | 2016 | Louisville | 0.22 | 0.10 | 0.33 |
32 | 2017 | Penn State | 0.23 | 0.10 | 0.33 |
33 | 2018 | Georgia | 0.28 | 0.05 | 0.33 |
34 | 2012 | Oregon | 0.23 | 0.09 | 0.32 |
35 | 2016 | Western Michigan | 0.31 | 0.01 | 0.32 |
36 | 2017 | Georgia | 0.21 | 0.12 | 0.32 |
37 | 2019 | Utah | 0.19 | 0.13 | 0.32 |
38 | 2020 | Buffalo | 0.34 | -0.02 | 0.32 |
39 | 2011 | Boise State | 0.23 | 0.07 | 0.31 |
40 | 2014 | Alabama | 0.24 | 0.07 | 0.31 |
41 | 2017 | Ohio State | 0.25 | 0.06 | 0.31 |
42 | 2020 | Cincinnati | 0.23 | 0.08 | 0.31 |
43 | 2020 | Ohio State | 0.30 | 0.01 | 0.31 |
44 | 2010 | Alabama | 0.19 | 0.11 | 0.30 |
45 | 2013 | Ohio State | 0.31 | -0.01 | 0.30 |
46 | 2013 | Oregon | 0.25 | 0.05 | 0.30 |
47 | 2013 | Wisconsin | 0.18 | 0.12 | 0.30 |
48 | 2014 | Michigan State | 0.19 | 0.11 | 0.30 |
49 | 2014 | Ohio State | 0.24 | 0.06 | 0.30 |
50 | 2014 | TCU | 0.17 | 0.13 | 0.30 |
And here are the 25 worst teams using the same criterion.
Rank | SEASON | TEAM | OFFENSE_PPA | DEFENSE_PPA | OVERALL |
1,380 | 2012 | UMass | -0.25 | -0.23 | -0.48 |
1,379 | 2019 | UMass | -0.13 | -0.33 | -0.46 |
1,378 | 2020 | Bowling Green | -0.18 | -0.26 | -0.44 |
1,377 | 2020 | Akron | -0.11 | -0.31 | -0.42 |
1,374 | 2010 | Memphis | -0.18 | -0.23 | -0.41 |
1,375 | 2011 | New Mexico | -0.14 | -0.27 | -0.41 |
1,376 | 2018 | Connecticut | 0.03 | -0.45 | -0.41 |
1,372 | 2011 | UNLV | -0.18 | -0.21 | -0.40 |
1,373 | 2015 | Kansas | -0.16 | -0.25 | -0.40 |
1,367 | 2012 | Idaho | -0.24 | -0.16 | -0.39 |
1,368 | 2013 | Florida International | -0.24 | -0.15 | -0.39 |
1,369 | 2013 | Miami (OH) | -0.22 | -0.17 | -0.39 |
1,370 | 2014 | SMU | -0.21 | -0.19 | -0.39 |
1,371 | 2017 | UTEP | -0.19 | -0.21 | -0.39 |
1,366 | 2010 | New Mexico | -0.22 | -0.16 | -0.38 |
1,365 | 2014 | Eastern Michigan | -0.16 | -0.21 | -0.37 |
1,363 | 2012 | Colorado | -0.14 | -0.22 | -0.36 |
1,364 | 2020 | Kansas | -0.18 | -0.18 | -0.36 |
1,358 | 2010 | New Mexico State | -0.15 | -0.20 | -0.35 |
1,359 | 2013 | Eastern Michigan | -0.04 | -0.31 | -0.35 |
1,360 | 2013 | Idaho | -0.12 | -0.23 | -0.35 |
1,361 | 2015 | North Texas | -0.12 | -0.24 | -0.35 |
1,362 | 2019 | Akron | -0.24 | -0.11 | -0.35 |
1,357 | 2020 | Louisiana Monroe | -0.10 | -0.24 | -0.34 |
1,351 | 2011 | Florida Atlantic | -0.26 | -0.07 | -0.33 |
We can visualize all of this by placing every team based on its overall offensive/defensive efficiency.
We can also break this down by conference year over year. For instance, where do we place SEC teams in each of these seasons?
What about the Big Ten?
We can break this down further to examine a team’s expected points based on pass/run.
Filtering to teams with at least 200 such plays in a season, which teams had the most efficient passing offense in terms of predicted points added?
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_Plays_Pass |
2017 | Oklahoma | 0.68 | 412 |
2020 | Alabama | 0.65 | 365 |
2018 | Alabama | 0.65 | 382 |
2018 | Oklahoma | 0.64 | 375 |
2013 | LSU | 0.60 | 306 |
2013 | Florida State | 0.59 | 407 |
2016 | Oklahoma | 0.58 | 372 |
2019 | Alabama | 0.57 | 394 |
2013 | Louisville | 0.56 | 393 |
2019 | LSU | 0.54 | 500 |
2010 | Auburn | 0.53 | 279 |
2011 | Baylor | 0.53 | 409 |
2010 | Boise State | 0.53 | 387 |
2013 | Baylor | 0.52 | 401 |
2016 | Western Michigan | 0.52 | 346 |
2020 | BYU | 0.51 | 353 |
2020 | Coastal Carolina | 0.51 | 265 |
2011 | Wisconsin | 0.50 | 326 |
2019 | Oklahoma | 0.50 | 365 |
2019 | Minnesota | 0.49 | 317 |
2014 | Oregon | 0.49 | 427 |
2016 | Toledo | 0.48 | 373 |
2020 | Florida | 0.48 | 462 |
2019 | Ohio State | 0.47 | 387 |
2018 | Georgia | 0.47 | 333 |
What about run offense?
SEASON | TEAM | OFFENSE_PPA_Run | OFFENSE_Plays_Run |
2016 | South Florida | 0.43 | 453 |
2020 | Buffalo | 0.41 | 257 |
2016 | Louisville | 0.40 | 437 |
2014 | Georgia Southern | 0.40 | 620 |
2017 | Louisville | 0.39 | 421 |
2018 | Oklahoma | 0.38 | 471 |
2019 | Clemson | 0.38 | 492 |
2020 | North Carolina | 0.37 | 424 |
2018 | Clemson | 0.37 | 475 |
2015 | Texas Tech | 0.37 | 410 |
2020 | Ohio State | 0.37 | 248 |
2018 | Ohio | 0.36 | 491 |
2013 | Ohio State | 0.35 | 571 |
2014 | Navy | 0.35 | 631 |
2016 | Navy | 0.35 | 657 |
2014 | Marshall | 0.35 | 454 |
2017 | Arizona | 0.35 | 547 |
2016 | New Mexico | 0.35 | 608 |
2019 | Navy | 0.34 | 689 |
2017 | Alabama | 0.34 | 502 |
2010 | Nevada | 0.33 | 618 |
2017 | Army | 0.33 | 689 |
2017 | Ohio State | 0.32 | 524 |
2015 | Navy | 0.32 | 654 |
2019 | Louisiana | 0.32 | 532 |
Flipping this around, which defenses yielded the most points to passing/run?
SEASON | TEAM | DEFENSE_PPA_Pass | DEFENSE_Plays_Pass |
2018 | Connecticut | -0.65 | 328 |
2013 | Air Force | -0.56 | 327 |
2013 | Eastern Michigan | -0.51 | 310 |
2015 | Rice | -0.50 | 288 |
2017 | East Carolina | -0.50 | 342 |
2018 | Georgia State | -0.48 | 304 |
2013 | California | -0.47 | 432 |
2013 | New Mexico | -0.46 | 335 |
2010 | Eastern Michigan | -0.46 | 293 |
2019 | UMass | -0.45 | 326 |
2013 | UAB | -0.44 | 356 |
2010 | Memphis | -0.43 | 331 |
2013 | Troy | -0.42 | 402 |
2012 | West Virginia | -0.41 | 423 |
2020 | Louisiana Monroe | -0.40 | 228 |
2017 | Oregon State | -0.40 | 354 |
2017 | Tulsa | -0.40 | 334 |
2015 | Ball State | -0.39 | 420 |
2013 | UTEP | -0.39 | 268 |
2017 | Kansas | -0.39 | 398 |
2011 | New Mexico | -0.38 | 346 |
2011 | Ball State | -0.38 | 386 |
2013 | Army | -0.37 | 251 |
2017 | Navy | -0.37 | 297 |
2010 | Rice | -0.37 | 427 |
SEASON | TEAM | DEFENSE_PPA_Run | DEFENSE_Plays_Run |
2018 | Connecticut | -0.48 | 497 |
2020 | UNLV | -0.45 | 225 |
2018 | Oregon State | -0.42 | 472 |
2015 | Eastern Michigan | -0.39 | 563 |
2019 | UMass | -0.39 | 528 |
2020 | Akron | -0.38 | 226 |
2020 | Minnesota | -0.37 | 209 |
2018 | Bowling Green | -0.35 | 550 |
2014 | Georgia State | -0.35 | 518 |
2014 | New Mexico State | -0.34 | 534 |
2015 | Idaho | -0.34 | 449 |
2020 | Ole Miss | -0.34 | 296 |
2016 | California | -0.32 | 504 |
2019 | New Mexico State | -0.32 | 491 |
2014 | Florida Atlantic | -0.32 | 307 |
2019 | Connecticut | -0.32 | 434 |
2013 | New Mexico | -0.32 | 477 |
2020 | Duke | -0.32 | 386 |
2016 | Arkansas | -0.32 | 374 |
2018 | Coastal Carolina | -0.32 | 424 |
2014 | Iowa State | -0.30 | 499 |
2018 | Illinois | -0.30 | 467 |
2013 | New Mexico State | -0.30 | 516 |
2014 | Rutgers | -0.29 | 418 |
2016 | Oregon | -0.29 | 485 |
What about the best defenses? Rutgers in 2012 with the best run defense? Really? Keep in mind that these ratings are unconditional based on opponent strength, though evidently Rutgers did have a bunch of defenders drafted in 2013, so..?
SEASON | TEAM | DEFENSE_PPA_Pass | DEFENSE_Plays_Pass |
2011 | Alabama | 0.47 | 321 |
2019 | Clemson | 0.41 | 374 |
2019 | Ohio State | 0.40 | 399 |
2016 | Michigan | 0.39 | 339 |
2017 | Wisconsin | 0.35 | 403 |
2010 | Boise State | 0.34 | 390 |
2013 | Florida State | 0.34 | 401 |
2012 | Bowling Green | 0.34 | 405 |
2012 | Fresno State | 0.34 | 387 |
2018 | Michigan | 0.34 | 335 |
2016 | Alabama | 0.33 | 447 |
2020 | Northwestern | 0.33 | 305 |
2016 | Ohio State | 0.33 | 381 |
2012 | Arizona State | 0.33 | 429 |
2014 | Clemson | 0.32 | 386 |
2010 | TCU | 0.31 | 327 |
2017 | Michigan | 0.31 | 329 |
2011 | South Carolina | 0.31 | 354 |
2010 | Miami | 0.30 | 332 |
2014 | Louisville | 0.30 | 422 |
2015 | Alabama | 0.30 | 444 |
2012 | Boise State | 0.30 | 372 |
2015 | Northwestern | 0.29 | 447 |
2010 | Nebraska | 0.28 | 400 |
2020 | Cincinnati | 0.28 | 338 |
SEASON | TEAM | DEFENSE_PPA_Run | DEFENSE_Plays_Run |
2012 | Rutgers | 0.30 | 408 |
2011 | Alabama | 0.30 | 306 |
2012 | Michigan State | 0.28 | 340 |
2013 | Michigan State | 0.27 | 356 |
2013 | Louisville | 0.27 | 337 |
2010 | Ohio State | 0.26 | 352 |
2011 | TCU | 0.24 | 406 |
2015 | Boston College | 0.23 | 378 |
2011 | Connecticut | 0.23 | 351 |
2013 | Wisconsin | 0.23 | 357 |
2012 | BYU | 0.23 | 343 |
2013 | Oklahoma State | 0.23 | 441 |
2012 | Alabama | 0.22 | 388 |
2011 | LSU | 0.22 | 389 |
2016 | Alabama | 0.22 | 354 |
2010 | West Virginia | 0.22 | 353 |
2013 | Baylor | 0.22 | 474 |
2019 | Georgia | 0.22 | 329 |
2011 | Cincinnati | 0.22 | 381 |
2013 | Tulane | 0.22 | 406 |
2013 | Iowa | 0.22 | 395 |
2010 | Arizona State | 0.22 | 411 |
2010 | Utah | 0.21 | 387 |
2018 | Michigan State | 0.21 | 331 |
2013 | Utah State | 0.21 | 466 |
Putting this all together we can break down teams overall as before, only now explicitly looking at teams based on their pass and run offense/defense efficiency. Florida State 2013 still jumps to the top, evidently their passing defense that year was fantastic in addition to their passing offense. The picture is mostly unchanged from looking at just the overall numbers, but breaking it down can reveal which teams were more balanced while others were pass/run heavy.
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2013 | Florida State | 0.59 | 0.26 | 0.34 | 0.19 | 1.38 |
2019 | Ohio State | 0.47 | 0.31 | 0.40 | 0.10 | 1.28 |
2019 | Clemson | 0.31 | 0.38 | 0.41 | 0.15 | 1.25 |
2010 | Boise State | 0.53 | 0.17 | 0.34 | 0.18 | 1.22 |
2018 | Alabama | 0.65 | 0.24 | 0.22 | 0.09 | 1.20 |
2011 | Alabama | 0.20 | 0.16 | 0.47 | 0.30 | 1.13 |
2010 | TCU | 0.37 | 0.22 | 0.31 | 0.19 | 1.09 |
2013 | Baylor | 0.52 | 0.13 | 0.14 | 0.22 | 1.01 |
2018 | Clemson | 0.26 | 0.37 | 0.17 | 0.21 | 1.01 |
2010 | Ohio State | 0.28 | 0.16 | 0.28 | 0.26 | 0.98 |
2011 | Wisconsin | 0.50 | 0.27 | 0.10 | 0.10 | 0.97 |
2013 | Louisville | 0.56 | -0.02 | 0.15 | 0.27 | 0.96 |
2019 | Alabama | 0.57 | 0.19 | 0.19 | 0.01 | 0.96 |
2017 | Alabama | 0.21 | 0.34 | 0.25 | 0.14 | 0.94 |
2016 | Alabama | 0.13 | 0.26 | 0.33 | 0.22 | 0.94 |
2016 | Michigan | 0.19 | 0.21 | 0.39 | 0.14 | 0.93 |
2019 | LSU | 0.54 | 0.24 | 0.05 | 0.10 | 0.93 |
2016 | Louisville | 0.22 | 0.40 | 0.16 | 0.14 | 0.92 |
2020 | Clemson | 0.29 | 0.24 | 0.21 | 0.17 | 0.91 |
2012 | Alabama | 0.26 | 0.23 | 0.20 | 0.22 | 0.91 |
2011 | LSU | 0.25 | 0.15 | 0.28 | 0.22 | 0.90 |
2020 | Alabama | 0.65 | 0.21 | 0.07 | -0.03 | 0.90 |
2013 | Alabama | 0.39 | 0.19 | 0.10 | 0.21 | 0.89 |
2019 | Utah | 0.45 | 0.12 | 0.21 | 0.09 | 0.87 |
2010 | Auburn | 0.53 | 0.26 | -0.05 | 0.12 | 0.86 |
We can also look at individual teams year over year to see how their offense/defense/overall.
For instance, Oregon was great in the early 2010s then had a few years in which they were down.
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2010 | Oregon | 0.27 | 0.19 | 0.12 | 0.21 | 0.79 |
2011 | Oregon | 0.19 | 0.27 | 0.15 | 0.04 | 0.65 |
2012 | Oregon | 0.29 | 0.27 | 0.21 | 0.03 | 0.80 |
2013 | Oregon | 0.39 | 0.27 | 0.12 | 0.02 | 0.80 |
2014 | Oregon | 0.49 | 0.28 | -0.04 | -0.01 | 0.72 |
2015 | Oregon | 0.25 | 0.31 | -0.11 | -0.22 | 0.23 |
2016 | Oregon | 0.26 | 0.20 | -0.21 | -0.29 | -0.04 |
2017 | Oregon | 0.09 | 0.21 | 0.07 | 0.04 | 0.41 |
2018 | Oregon | 0.20 | 0.17 | -0.03 | 0.02 | 0.36 |
2019 | Oregon | 0.27 | 0.17 | 0.10 | 0.08 | 0.62 |
2020 | Oregon | 0.38 | 0.23 | -0.09 | -0.08 | 0.44 |
Florida State had that amazing year, but has really fallen back after Jimbo’s bad year and eventual departure.
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2010 | Florida State | 0.09 | 0.12 | 0.05 | 0.06 | 0.32 |
2011 | Florida State | 0.18 | -0.02 | 0.14 | 0.19 | 0.49 |
2012 | Florida State | 0.23 | 0.21 | 0.26 | 0.14 | 0.84 |
2013 | Florida State | 0.59 | 0.26 | 0.34 | 0.19 | 1.38 |
2014 | Florida State | 0.23 | 0.08 | -0.01 | -0.05 | 0.25 |
2015 | Florida State | 0.21 | 0.15 | 0.14 | 0.00 | 0.50 |
2016 | Florida State | 0.13 | 0.24 | 0.00 | -0.07 | 0.30 |
2017 | Florida State | -0.08 | 0.03 | 0.08 | 0.00 | 0.03 |
2018 | Florida State | -0.03 | -0.13 | -0.04 | 0.01 | -0.19 |
2019 | Florida State | 0.07 | 0.08 | -0.07 | 0.00 | 0.08 |
2020 | Florida State | -0.23 | 0.30 | -0.28 | -0.16 | -0.37 |
Wisconsin has generally been pretty good but not quite elite during this time period, people usually hound them for their passing game lately, how do they compare year over year?
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2010 | Wisconsin | 0.46 | 0.25 | 0.09 | 0.05 | 0.85 |
2011 | Wisconsin | 0.50 | 0.27 | 0.10 | 0.10 | 0.97 |
2012 | Wisconsin | 0.11 | 0.11 | 0.07 | 0.07 | 0.36 |
2013 | Wisconsin | 0.13 | 0.27 | 0.08 | 0.23 | 0.71 |
2014 | Wisconsin | 0.03 | 0.30 | 0.20 | 0.04 | 0.57 |
2015 | Wisconsin | 0.03 | 0.03 | 0.22 | 0.10 | 0.38 |
2016 | Wisconsin | 0.16 | 0.03 | 0.15 | 0.13 | 0.47 |
2017 | Wisconsin | 0.18 | 0.13 | 0.35 | 0.10 | 0.76 |
2018 | Wisconsin | 0.00 | 0.29 | 0.00 | -0.08 | 0.21 |
2019 | Wisconsin | 0.31 | 0.25 | 0.18 | 0.05 | 0.79 |
2020 | Wisconsin | -0.07 | 0.12 | 0.14 | 0.22 | 0.41 |
What about a historically weaker team that has been on the rise lately? Iowa State?
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2010 | Iowa State | -0.12 | -0.02 | -0.11 | -0.02 | -0.27 |
2011 | Iowa State | -0.18 | -0.03 | 0.05 | -0.03 | -0.19 |
2012 | Iowa State | 0.02 | -0.07 | -0.11 | 0.07 | -0.09 |
2013 | Iowa State | -0.03 | -0.13 | -0.14 | -0.06 | -0.36 |
2014 | Iowa State | -0.10 | 0.01 | -0.11 | -0.30 | -0.50 |
2015 | Iowa State | -0.05 | 0.10 | -0.21 | -0.12 | -0.28 |
2016 | Iowa State | 0.16 | 0.08 | -0.14 | -0.19 | -0.09 |
2017 | Iowa State | 0.18 | 0.01 | -0.05 | 0.07 | 0.21 |
2018 | Iowa State | 0.12 | 0.04 | -0.08 | 0.02 | 0.10 |
2019 | Iowa State | 0.28 | 0.02 | -0.03 | -0.03 | 0.24 |
2020 | Iowa State | 0.22 | 0.18 | -0.01 | 0.05 | 0.44 |
And of course the team I keep randomly selecting, Texas A&M.
SEASON | TEAM | OFFENSE_PPA_Pass | OFFENSE_PPA_Run | DEFENSE_PPA_Pass | DEFENSE_PPA_Run | Overall_PPA |
2010 | Texas A&M | 0.04 | 0.03 | 0.18 | 0.21 | 0.46 |
2011 | Texas A&M | 0.14 | 0.11 | 0.00 | 0.02 | 0.27 |
2012 | Texas A&M | 0.28 | 0.29 | 0.07 | 0.04 | 0.68 |
2013 | Texas A&M | 0.38 | 0.22 | -0.09 | -0.14 | 0.37 |
2014 | Texas A&M | 0.14 | 0.16 | -0.07 | -0.19 | 0.04 |
2015 | Texas A&M | -0.07 | 0.11 | 0.15 | -0.08 | 0.11 |
2016 | Texas A&M | 0.06 | 0.20 | -0.03 | -0.04 | 0.19 |
2017 | Texas A&M | 0.02 | 0.04 | 0.00 | -0.03 | 0.03 |
2018 | Texas A&M | 0.15 | 0.17 | -0.22 | 0.07 | 0.17 |
2019 | Texas A&M | 0.06 | 0.20 | 0.03 | -0.01 | 0.28 |
2020 | Texas A&M | 0.26 | 0.16 | -0.09 | 0.05 | 0.38 |